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WHAT IS CLAIMED IS: 

1. A computer-implemented system for performing data mining 
applications, comprising: 

(a) a computer having one or more data storage devices connected thereto, 
5 wherein a relational database is stored on one or more of the data storage devices; 

(b) a relational database management system, executed by the computer, for 
accessing the relational database stored on the data storage devices; and 

(c) an analytic application programming interface (API) that generates a set 
of scalable data mining functions, executed by the computer, for performing data 

10 mining operations directly within the database management system. 

2. The system of claim 1 above, wherein the computer comprises a 
parallel processing computer comprisecyof a plurality of nodes, and each node 

n executes one or more threads of the relational database management system to 

^ 15 provide parallelism in the data mining operations. 

3. The system of^lahim, wherein the scalable data mining functions 

Z process data collections stored in tpe relational database and produce results that are 

3 stored in the relational database. 

jl 4. The system of claim 1, wherein the scalable data mining functions 

are created by parametert^Hg/and instantiating the analytic API. 

5. The system oy claim 1, wherein the scalable data mining functions 
comprise queries for execution by the relational database management system. 

/ . . 

6. The systeiy of claim 5, wherein the scalable data mining functions 
are dynamically generated queries comprised of combined phrases with 
substituting values therein based on parameters supplied to the analytic API. 

30 7. The system of claim 6, wherein the scalable data mining functions 

are selected from a gijoup of functions comprising Data Description functions, Data 
Derivation functions, Data Reduction functions, Data Reorganization functions, 
Data Sampling functions, and Data Partitioning functions. 
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8. The system of claim 7, wherein the Data Description functions 
comprise descriptive statistical functions. 

9. The system of claim 7, wherein theyData Description functions are 
5 selected from a group comprising: 

(1) descriptive statistics for one or m£>re numeric columns, wherein the 
statistics are selected from a group comprising count, minimum, 
maximum, mean, standard deviation, standard mean error, variance, 
coefficient of variance, skewrfess, kurtosis, uncorrected sum of 

10 squares, corrected sum of sduares, and quantiles, 

(2) a count of values for a colt 

(3) a calculated modality for k column, 

(4) one or more bin numeric columns of counts with overlay and 
statistics options, 

15 (5) one or more automatically sub-binned numeric columns giving 

additional counts ana isolated frequently occurring individual values 

(6) a computed h^qtiency of one or more column values, 

(7) a computed^ frequency of values for pairs of columns in a column list, 

(8) a Pearson/Produ^t-^Ioment Correlation matrix, 
20 (9) a Covarid 

(10) a sum of squar^s^^cros^products matrix, and 

(1 1) a count a£ overlapping column values in one or more combinations 
of tables. 

25 10. The system' of claim 7, wherein the Data Derivation functions 

provide column derivations or transformations. 

1 1. The system of claim 7, wherein the Data Description functions are 
selected from a group comprising: 
30 (1) a derived binned numeric column wherein a new column is bin 

numby, 

(2) a n-vahied categorical column dummy-coded into M n t? 0/ 1 values, 



(3) a n-valued categorical column recoded into n or less new values, 

(4) one or more numeric columns scaled via range transformation, 
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one or more columns scaled to a z-score that a number of standard 
deviations from a mean, 

one or more numeric columns scaled via ay&igmoidal transformation 
function, 

one or more numeric columns scaled vil a base 10 logarithm 
function, 

one or more numeric columns scalecl^ia a natural logarithm 
function, 

one or more numeric columns scafed via an exponential function, 
one or more numeric columns raised to a specified power, 
one or more numeric columns derived via user defined 
transformation function, 

one or more new columns dqfived by ranking one or more columns 
or expressions based on orde 

one or more new columns derived with quantile 0 to n-1 based on 
order and n, 

a cumulative sum^oflTVafrie expression based on a sort expression, 
a moving average of a vTalu^ expression based on a width and order, 
a moving sum of a va lysf expression based on a width and order, 
a moving difference of A^alue expression based on a width and 
order, j I 

a moving linear regression value derived from an expression, width, 
and order, 

a multiple account/ product ownership bitmap, 
a product ownership bitmap over multiple time periods, 
one or more counts, amount, percentage means and intensities 



derived from a/transaction summary, 
/ * um:^: i ; i r 



one or more ^ariabilities derived from transaction summary data, 
one or more derived trigonometric values and their inverses, 
including sir/, arcsin, cos, arccos, esc, arccsc, sec, arcsec, tan, arctan, 
cot, and arccot, and 

one or mo^L derived hyperbolic values and their inverses, including 
sinh, arcsinh, cosh, arccosh, csch, arccsch, sech, arcsech, tanh, 
arctanh, coth, and arccoth. 
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12. The system of claim 7, wherein the Data Keduction functions 
provide matrix building operations to reduce the amount of data required for 
analytic algorithms. 



13. The system of claim 7, wherein the JData Reduction functions are 
selected from a group comprising: 



(i) 



(2) 
(3) 



build one or more data reduction /matrices from a group comprising: 
(i) a Pearson-Product Moment Correlations matrix; (ii) a Covariances 
matrix; and (iii) a Sum of Squayes and Cross Products (SSCP) matrix, 
export a resultant matrix, and/ 
restart a matrix operation. 



14. The system of claim 7, wherein the Data Reorganization functions 
provide an ability to reorganize data by joining or de-normalizing pre-processed 
results into a wide analytic data set. 

15. The system of/6laim 7[wherein the Data Reorganization functions 
are selected from a group comprising*. 

(1) create a d^normalizea new table by removing one or more key 
columns/and 

(2) join a plurality of^ables orVie^Tinto a combined result table. 

16. The systenTo? claim 7, wherein the Data Sampling function provides 
an ability to construct a new table containing a randomly selected subset of the 

17. The system of/claim 7, wherein the Data Sample function selects one 
or more data samples of specified sizes from a table. 



18. The system of claim 7, wherein the Data Partitioning function 
provides an ability to construct a new table containing at least one randomly 
selected subset of the row| in an existing table or view, wherein the subsets are 
mutually distinct but all-inclusive subsets of data. 
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19. The system of claim 7, wherein the Data Partitioning function selects 
one or more data partitions from a table using i. database internal hashing 
technique. 

20. The system of claim 1, wherein results of the data mining operations 
are stored in the relational databases. 
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21. The system of claim 1, w/herein the relational database management 
system further comprises an analyticaylogical data model that stores metadata and 
processing results from the Scalable Data Mining Functions. 
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22. A method for performing data mining applications, comprising: 
• (a) storing a relational database on one or more data storage devices 
connected to a computer; / 

(b) accessing the relational database stored on the data storage devices using a 
relational database management system; and 

(c) utilizing a comprehensive set of parameterized analytic capabilities for 
performing data mining opera^iops directly wijkm a massively parallel relational 



database management 



;yster 



23. An article/of manufacture comprising logic embodying a method for 
25 performing data mining/applications, comprising: 

(a) storing a relational database on one or more data storage devices 
connected to a computer; 

(b) accessing thi relational database stored on the data storage devices using a 
relational database management system; and 

30 (c) utilizing a [comprehensive set of parameterized analytic capabilities for 

performing data mining operations directly within a massively parallel relational 
database management system. 





